A Multi-layer Chinese Word Segmentation System Optimized for Out-of-domain Tasks
نویسندگان
چکیده
State-of-the-art Chinese word segmentation systems have achieved high performance when training data and testing data are from the same domain. However, they suffer from the generalizability problem when applied on test data from different domains. We introduce a multi-layer Chinese word segmentation system which can integrate the outputs from multiple heterogeneous segmentation systems. By training a second layer of large margin classifier on top of the outputs from several Conditional Random Fields classifiers, it can utilize a small amount of in-domain training data to improve the performance. Experimental results show consistent improvement on F1 scores and OOV recall rates by applying the approach.
منابع مشابه
An Automated MR Image Segmentation System Using Multi-layer Perceptron Neural Network
Background: Brain tissue segmentation for delineation of 3D anatomical structures from magnetic resonance (MR) images can be used for neuro-degenerative disorders, characterizing morphological differences between subjects based on volumetric analysis of gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF), but only if the obtained segmentation results are correct. Due to image arti...
متن کاملMulti-task Multi-domain Representation Learning for Sequence Tagging
Many domain adaptation approaches rely on learning cross domain shared representations to transfer the knowledge learned in one domain to other domains. Traditional domain adaptation only considers adapting for one task. In this paper, we explore multi-task representation learning under the domain adaptation scenario. We propose a neural network framework that supports domain adaptation for mul...
متن کاملMulti-task Domain Adaptation for Sequence Tagging
Many domain adaptation approaches rely on learning cross domain shared representations to transfer the knowledge learned in one domain to other domains. Traditional domain adaptation only considers adapting for one task. In this paper, we explore multi-task representation learning under the domain adaptation scenario. We propose a neural network framework that supports domain adaptation for mul...
متن کاملRules-based Chinese Word Segmentation on MicroBlog for CIPS-SIGHAN on CLP2012
In this evaluation, we have taken part in the task of the Word Segmentation on Chinese MicroBlog. In this task, after analysing the feature of the MicroBlog and the result of our original Chinese word segmentation system, four Optimization Rules are proposed to optimize the segmentation algorithm for Chinese word segmentation on MicroBlog corpora. The optimized segmentation system is based on c...
متن کاملImprovement of Frequency Fluctuations in Microgrids Using an Optimized Fuzzy P-PID Controller by Modified Multi Objective Gravitational Search Algorithm
Microgrids is an new opportunity to reduce the total costs of power generation and supply the energy demands through small-scale power plants such as wind sources, photo voltaic panels, battery banks, fuel cells, etc. Like any power system in micro grid (MG), an unexpected faults or load shifting leads to frequency oscillations. Hence, this paper employs an adaptive fuzzy P-PID controller for f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010